Introduction

After what was analyzed in the lectures, I’m interested to see if we can visualize a high level of comorbidity between smoking, exercising and COVID-19 within the dataset. Before I try to put a model in place, I need to find external datasets about smoking and physical exercise such that I may be able to make any further analysis. I will attempt to relate the datasets through the country column.

Data

COVID-19

The COVID-19 data for this report consists of 2 CSVs that you can find here.

Each one represents the confirmed cases and deaths worldwide.

Confirmed Cases

## # A tibble: 603 Ă— 3
## # Groups:   country [201]
##    country     year   cases
##    <chr>       <chr>  <dbl>
##  1 Afghanistan 2020   52330
##  2 Afghanistan 2021  158084
##  3 Afghanistan 2022  204610
##  4 Albania     2020   58316
##  5 Albania     2021  210224
##  6 Albania     2022  333161
##  7 Algeria     2020   99610
##  8 Algeria     2021  218432
##  9 Algeria     2022  270952
## 10 Andorra     2020    8049
## # … with 593 more rows

Confirmed Deaths

## # A tibble: 603 Ă— 3
## # Groups:   country [201]
##    country     year   cases
##    <chr>       <chr>  <dbl>
##  1 Afghanistan 2020   52330
##  2 Afghanistan 2021  158084
##  3 Afghanistan 2022  204610
##  4 Albania     2020   58316
##  5 Albania     2021  210224
##  6 Albania     2022  333161
##  7 Algeria     2020   99610
##  8 Algeria     2021  218432
##  9 Algeria     2022  270952
## 10 Andorra     2020    8049
## # … with 593 more rows

Tobacco Atlas

For cigarette consumption I will use the dataset avaialable throught theTobacco Atlas available here.

Fields

  1. Country
  2. Average daily number of cigarettes consumed per adult (15+ yr) smoker, 2019

Ipsos Global Advisor

Global Views on Exercise and Team Sports

For the exercise information I will use the dataset available here.

Fields

  1. Country
  2. Mean Number of Hours Physical Excercise Per Week

United Nations

Department of Economic and Social Affairs, World Population Prospects 2022

For the age information I will use the dataset available here.

Fields

  1. Country
  2. Median Age

Table Joins

Now that we have the data loaded, let’s join all of the different tables by country and year

## # A tibble: 69 Ă— 8
## # Groups:   country [23]
##    country   year     cases deaths mortality consumption weekly_excerc…¹ media…²
##    <chr>     <chr>    <dbl>  <dbl>     <dbl>       <dbl>           <dbl>   <dbl>
##  1 Argentina 2020   1625514  43245     2.66         978.             4.4    31.0
##  2 Argentina 2021   5654408 117169     2.07         978.             4.4    31.0
##  3 Argentina 2022   9721718 130011     1.34         978.             4.4    31.0
##  4 Australia 2020     28425    909     3.20         668.             6.2    36.7
##  5 Australia 2021    425496   2253     0.529        668.             6.2    36.7
##  6 Australia 2022  10487217  15881     0.151        668.             6.2    36.7
##  7 Belgium   2020    646496  19528     3.02        1284.             6.9    40.8
##  8 Belgium   2021   2105343  28331     1.35        1284.             6.9    40.8
##  9 Belgium   2022   4624251  33000     0.714       1284.             6.9    40.8
## 10 Brazil    2020   7681032 195072     2.54         330.             3      32.4
## # … with 59 more rows, and abbreviated variable names
## #   ¹​weekly_excercise_hours_mean, ²​median_age

Mortality

Let’s graph the mortality rate before the Vaccine came out (August 2021).

Models

This is the model summary for cigarette consumption

## 
## Call:
## lm(formula = deaths ~ consumption, data = covid_stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -162110  -93661  -20662   38908  531415 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 194815.73   31870.81   6.113 5.64e-08 ***
## consumption   -113.89      33.39  -3.410   0.0011 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 130500 on 67 degrees of freedom
## Multiple R-squared:  0.1479, Adjusted R-squared:  0.1352 
## F-statistic: 11.63 on 1 and 67 DF,  p-value: 0.001103

This is the model summary for weekly excercise mean

## 
## Call:
## lm(formula = deaths ~ weekly_excercise_hours_mean, data = covid_stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -124917  -80723  -33363   12928  557159 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   162385      42162   3.851 0.000265 ***
## weekly_excercise_hours_mean   -10296       6415  -1.605 0.113182    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 138800 on 67 degrees of freedom
## Multiple R-squared:  0.03703,    Adjusted R-squared:  0.02266 
## F-statistic: 2.576 on 1 and 67 DF,  p-value: 0.1132

And this is the model summary for the median age

## 
## Call:
## lm(formula = deaths ~ median_age, data = covid_stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -162708  -72866  -31953   40166  549404 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   402605      94665   4.253 6.69e-05 ***
## median_age     -8124       2508  -3.240  0.00186 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 131500 on 67 degrees of freedom
## Multiple R-squared:  0.1354, Adjusted R-squared:  0.1225 
## F-statistic: 10.49 on 1 and 67 DF,  p-value: 0.001865

Graphing the models

Cigarette Consumption

Weekly Excercise

Age

Bias & Conclusion

Finally, please find the session info below.

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] scales_1.2.1     plotly_4.10.1    lubridate_1.9.0  timechange_0.1.1
##  [5] knitr_1.40       forcats_0.5.2    stringr_1.4.1    dplyr_1.0.10    
##  [9] purrr_0.3.5      readr_2.1.3      tidyr_1.2.1      tibble_3.1.8    
## [13] ggplot2_3.4.0    tidyverse_1.3.2 
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.2.1    digest_0.6.30       utf8_1.2.2         
##  [4] R6_2.5.1            cellranger_1.1.0    backports_1.4.1    
##  [7] reprex_2.0.2        evaluate_0.18       highr_0.9          
## [10] httr_1.4.4          pillar_1.8.1        rlang_1.0.6        
## [13] curl_4.3.3          lazyeval_0.2.2      googlesheets4_1.0.1
## [16] readxl_1.4.1        rstudioapi_0.14     data.table_1.14.4  
## [19] jquerylib_0.1.4     rmarkdown_2.17      labeling_0.4.2     
## [22] googledrive_2.0.0   htmlwidgets_1.5.4   bit_4.0.4          
## [25] munsell_0.5.0       broom_1.0.1         compiler_4.2.1     
## [28] modelr_0.1.9        xfun_0.34           pkgconfig_2.0.3    
## [31] htmltools_0.5.3     tidyselect_1.2.0    viridisLite_0.4.1  
## [34] fansi_1.0.3         crayon_1.5.2        tzdb_0.3.0         
## [37] dbplyr_2.2.1        withr_2.5.0         grid_4.2.1         
## [40] jsonlite_1.8.3      gtable_0.3.1        lifecycle_1.0.3    
## [43] DBI_1.1.3           magrittr_2.0.3      vroom_1.6.0        
## [46] cli_3.4.1           stringi_1.7.8       cachem_1.0.6       
## [49] farver_2.1.1        fs_1.5.2            xml2_1.3.3         
## [52] bslib_0.4.1         ellipsis_0.3.2      generics_0.1.3     
## [55] vctrs_0.5.0         tools_4.2.1         bit64_4.0.5        
## [58] glue_1.6.2          crosstalk_1.2.0     hms_1.1.2          
## [61] parallel_4.2.1      fastmap_1.1.0       yaml_2.3.6         
## [64] colorspace_2.0-3    gargle_1.2.1        rvest_1.0.3        
## [67] haven_2.5.1         sass_0.4.2